
PaperBanana: Automating Academic Illustration for AI Scientists
J. Niu, Z. Liu, Z. Gu, B. Wang, L. Ouyang, Z. Zhao, T. Chu, T. He, F. Wu, Q. Zhang, Z. Jin, G. Liang,
R. Zhang, W. Zhang, Y. Qu, Z. Ren, Y. Sun, Y. Zheng, D. Ma, Z. Tang, B. Niu, Z. Miao, H. Dong,
S. Qian, J. Zhang, J. Chen, F. Wang, X. Zhao, L. Wei, W. Li, S. Wang, R. Xu, Y. Cao, L. Chen, Q. Wu,
H. Gu, L. Lu, K. Wang, D. Lin, G. Shen, X. Zhou, L. Zhang, Y. Zang, X. Dong, J. Wang, B. Zhang,
L. Bai, P. Chu, W. Li, J. Wu, L. Wu, Z. Li, G. Wang, Z. Tu, C. Xu, K. Chen, Y. Qiao, B. Zhou, D. Lin,
W. Zhang, and C. He. Mineru2.5: A decoupled vision-language model for efficient high-resolution
document parsing, 2025. URL https://arxiv.org/abs/2509.22186.
OpenAI. Introducing chatgpt. 2022. URL https://openai.com/blog/chatgpt.
OpenAI. Gpt-image-1. https://platform.openai.com/docs/models/gpt-image-1, 2025a.
OpenAI. Introducing gpt-5. https://openai.com/index/introducing-gpt-5/, 2025b.
W. Pang, K. Q. Lin, X. Jian, X. He, and P. Torr. Paper2poster: Towards multimodal poster automation
from scientific papers. arXiv preprint arXiv:2505.21497, 2025.
A. Quispel, A. Maes, and J. Schilperoord. Aesthetics and clarity in information visualization: The
designer’s perspective. In Arts, volume 7, page 72. MDPI, 2018.
J. A. Rodriguez, D. Vazquez, I. Laradji, M. Pedersoli, and P. Rodriguez. Figgen: Text to scientific figure
generation. arXiv preprint arXiv:2306.00800, 2023.
J. Schmidhuber. Artificial scientists & artists based on the formal theory of creativity. In 3d Conference
on Artificial General Intelligence (AGI-2010), pages 148–153. Atlantis Press, 2010.
W. Seo, S. Lee, D. Kang, H. An, Z. Yuan, and S. Lee. Automated visualization code synthesis via
multi-path reasoning and feedback-driven optimization. arXiv preprint arXiv:2502.11140, 2025.
N. Shinn, F. Cassano, A. Gopinath, K. Narasimhan, and S. Yao. Reflexion: Language agents with verbal
reinforcement learning. Advances in Neural Information Processing Systems, 36:8634–8652, 2023.
C. Snell, J. Lee, K. Xu, and A. Kumar. Scaling llm test-time compute optimally can be more effective
than scaling model parameters. arXiv preprint arXiv:2408.03314, 2024.
J. Sun, F. Zhang, Y. Feng, C. Li, Z. Li, J. Ai, Y. Chang, Y. Dai, and K. Zhang. From pixels to paths: A
multi-agent framework for editable scientific illustration. arXiv preprint arXiv:2510.27452, 2025.
Y. Tang, X. Liu, B. Zhang, T. Lan, Y. Xie, J. Lao, Y. Wang, H. Li, T. Gao, B. Pan, et al. Igenbench:
Benchmarking the reliability of text-to-infographic generation. arXiv preprint arXiv:2601.04498,
2026.
K. Team, J. Chen, Y. Ci, X. Du, Z. Feng, K. Gai, S. Guo, F. Han, J. He, K. He, et al. Kling-omni technical
report. arXiv preprint arXiv:2512.16776, 2025.
Y. Tian, W. Cui, D. Deng, X. Yi, Y. Yang, H. Zhang, and Y. Wu. Chartgpt: Leveraging llms to generate
charts from abstract natural language. IEEE Transactions on Visualization and Computer Graphics,
31(3):1731–1745, 2024.
E. R. Tufte and P. R. Graves-Morris. The visual display of quantitative information, volume 2. Graphics
press Cheshire, CT, 1983.
C. Wu, J. Li, J. Zhou, J. Lin, K. Gao, K. Yan, S.-m. Yin, S. Bai, X. Xu, Y. Chen, et al. Qwen-image
technical report. arXiv preprint arXiv:2508.02324, 2025a.
15